Trends

By Evgenia "Jenny" Nitishinskaya and Delaney Granizo-Mackenzie

Notebook released under the Creative Commons Attribution 4.0 License.


Trends estimate tendencies in data over time, such as overall rising or falling amid noise. They use only historical data and not any knowledge about the processes generating them.

Linear trend models

A linear trend model assumes that the variable changes at a constant rate with time, and attempts to find a line of best fit. We want to find coefficients $b_0$, $b_1$ such that the series $y_t$ satisfies $$ y_t = b_0 + b_1t + \epsilon_t $$ and so that the sum of the squares of the errors $\epsilon_t$ is minimized. This can be done using a linear regression. After we have fitted a linear model to our data, we predict the value of the variable to be $y_t = b_0 + b_1 t$ for future time periods $t$. We can also use these parameters to compare the rates of growth or decay of two data series.

Let's find a linear trend model for the price of XLY, an ETF for consumer goods.


In [ ]:
import numpy as np
import math
from statsmodels import regression
import statsmodels.api as sm
import matplotlib.pyplot as plt

In [227]:
start = '2010-01-01'
end = '2015-01-01'
asset = get_pricing('XLY', fields='price', start_date=start, end_date=end)
dates = asset.index

def linreg(X,Y):
    # Running the linear regression
    x = sm.add_constant(X)
    model = regression.linear_model.OLS(Y, x).fit()
    a = model.params[0]
    b = model.params[1]

    # Return summary of the regression and plot results
    X2 = np.linspace(X.min(), X.max(), 100)
    Y_hat = X2 * b + a
    plt.plot(X2, Y_hat, 'r', alpha=0.9);  # Add the regression line, colored in red
    return model.summary()

_, ax = plt.subplots()
ax.plot(asset)
ticks = ax.get_xticks()
ax.set_xticklabels([dates[i].date() for i in ticks[:-1]]) # Label x-axis with dates
linreg(np.arange(len(asset)), asset)


[2015-06-17 18:33:54.852354] DEBUG: root: Enter SimpleTable.data2rows.
[2015-06-17 18:33:54.853090] DEBUG: root: Exit SimpleTable.data2rows.
[2015-06-17 18:33:54.853612] DEBUG: root: Enter SimpleTable.data2rows.
[2015-06-17 18:33:54.854157] DEBUG: root: Exit SimpleTable.data2rows.
[2015-06-17 18:33:54.855192] DEBUG: root: Enter SimpleTable.data2rows.
[2015-06-17 18:33:54.855716] DEBUG: root: Exit SimpleTable.data2rows.
[2015-06-17 18:33:54.856299] DEBUG: root: Enter SimpleTable.data2rows.
[2015-06-17 18:33:54.856788] DEBUG: root: Exit SimpleTable.data2rows.
[2015-06-17 18:33:54.857276] DEBUG: root: Enter SimpleTable.data2rows.
[2015-06-17 18:33:54.857765] DEBUG: root: Exit SimpleTable.data2rows.
Out[227]:
OLS Regression Results
Dep. Variable: Security(19662 [XLY]) R-squared: 0.944
Model: OLS Adj. R-squared: 0.944
Method: Least Squares F-statistic: 2.127e+04
Date: Wed, 17 Jun 2015 Prob (F-statistic): 0.00
Time: 18:33:54 Log-Likelihood: -3168.7
No. Observations: 1258 AIC: 6341.
Df Residuals: 1256 BIC: 6352.
Df Model: 1
Covariance Type: nonrobust
coef std err t P>|t| [95.0% Conf. Int.]
const 26.5739 0.169 156.852 0.000 26.242 26.906
x1 0.0340 0.000 145.846 0.000 0.034 0.034
Omnibus: 185.497 Durbin-Watson: 0.025
Prob(Omnibus): 0.000 Jarque-Bera (JB): 55.107
Skew: -0.236 Prob(JB): 1.08e-12
Kurtosis: 2.090 Cond. No. 1.45e+03

The summary returned by the regression tells us the slope and intercept of the line, as well as giving us some information about how statistically valid the fit is. Note that the Durbin-Watson statistic is very low here, suggeesting that the errors are correlated. The price of this fund is generally increasing, but because of the variance in the data, the line of best fit changes significantly depending on the sample we take. Because small errors in our model magnify with time, its predictions far into the future may not be as good as the fit statistics would suggest. For instance, we can see what will happen if we find a model for the data through 2012 and use it to predict the data through 2014.


In [311]:
# Take only some of the data in order to see how predictive the model is
asset_short = get_pricing('XLY', fields='price', start_date=start, end_date='2013-01-01')

# Running the linear regression
x = sm.add_constant(np.arange(len(asset_short)))
model = regression.linear_model.OLS(asset_short, x).fit()
X2 = np.linspace(0, len(asset), 100)
Y_hat = X2 * model.params[1] + model.params[0]

# Plot the data for the full time range
_, ax = plt.subplots()
ax.plot(asset)
ticks = ax.get_xticks()
ax.set_xticklabels([dates[i].date() for i in ticks[:-1]]) # Label x-axis with dates

# Plot the regression line extended to the full time range
ax.plot(X2, Y_hat, 'r', alpha=0.9);


Of course, we can keep updating our model as we go along. Below we use all the previous prices to predict prices 30 days into the future.


In [309]:
# Y_hat will be our predictions for the price
Y_hat = [0]*1100

# Start analysis from day 100 so that we have historical prices to work with
for i in range(100,1200):
    temp = asset[:i]
    x = sm.add_constant(np.arange(len(temp)))
    model = regression.linear_model.OLS(temp, x).fit()
    # Plug (i+30) into the linear model to get the predicted price 30 days from now
    Y_hat[i-100] = (i+30) * model.params[1] + model.params[0]

_, ax = plt.subplots()
ax.plot(asset[130:1230]) # Plot the asset starting from the first day we have predictions for
ax.plot(range(len(Y_hat)), Y_hat, 'r', alpha=0.9)
ticks = ax.get_xticks()
ax.set_xticklabels([dates[i].date() for i in ticks[:-1]]) # Label x-axis with dates;


Log-linear trend models

A log-linear trend model attempts to fit an exponential curve to a data set: $$ y_t = e^{b_0 + b_1 t + \epsilon_t} $$

To find the coefficients, we can run a linear regression on the equation $ \ln y_t = b_0 + b_1 t + \epsilon_t $ with variables $t, \ln y_t$. (This is the reason for the name of the model — the equation is linear when we take the logarithm of both sides!)

If $b_1$ is very small, then a log-linear curve is approximately linear. For instance, we can find a log-linear model for our data from the previous example, with fit statistics approximately the same as for the linear model.


In [189]:
def loglinreg(X,Y):
    # Running the linear regression on X, log(Y)
    x = sm.add_constant(X)
    model = regression.linear_model.OLS(np.log(Y), x).fit()
    a = model.params[0]
    b = model.params[1]

    # Return summary of the regression and plot results
    X2 = np.linspace(X.min(), X.max(), 100)
    Y_hat = (math.e)**(X2 * b + a)
    plt.plot(X2, Y_hat, 'r', alpha=0.9);  # Add the regression curve, colored in red
    return model.summary()

_, ax_log = plt.subplots()
ax_log.plot(asset)
ax_log.set_xticklabels([dates[i].date() for i in ticks[:-1]]) # Label x-axis with dates
loglinreg(np.arange(len(asset)), asset)


[2015-06-17 18:09:46.847271] DEBUG: root: Enter SimpleTable.data2rows.
[2015-06-17 18:09:46.848002] DEBUG: root: Exit SimpleTable.data2rows.
[2015-06-17 18:09:46.848535] DEBUG: root: Enter SimpleTable.data2rows.
[2015-06-17 18:09:46.849075] DEBUG: root: Exit SimpleTable.data2rows.
[2015-06-17 18:09:46.850109] DEBUG: root: Enter SimpleTable.data2rows.
[2015-06-17 18:09:46.850651] DEBUG: root: Exit SimpleTable.data2rows.
[2015-06-17 18:09:46.851241] DEBUG: root: Enter SimpleTable.data2rows.
[2015-06-17 18:09:46.851741] DEBUG: root: Exit SimpleTable.data2rows.
[2015-06-17 18:09:46.852235] DEBUG: root: Enter SimpleTable.data2rows.
[2015-06-17 18:09:46.852717] DEBUG: root: Exit SimpleTable.data2rows.
Out[189]:
OLS Regression Results
Dep. Variable: Security(19662 [XLY]) R-squared: 0.959
Model: OLS Adj. R-squared: 0.959
Method: Least Squares F-statistic: 2.913e+04
Date: Wed, 17 Jun 2015 Prob (F-statistic): 0.00
Time: 18:09:46 Log-Likelihood: 1890.5
No. Observations: 1258 AIC: -3777.
Df Residuals: 1256 BIC: -3767.
Df Model: 1
Covariance Type: nonrobust
coef std err t P>|t| [95.0% Conf. Int.]
const 3.3868 0.003 1115.335 0.000 3.381 3.393
x1 0.0007 4.18e-06 170.663 0.000 0.001 0.001
Omnibus: 36.939 Durbin-Watson: 0.041
Prob(Omnibus): 0.000 Jarque-Bera (JB): 22.844
Skew: -0.186 Prob(JB): 1.10e-05
Kurtosis: 2.454 Cond. No. 1.45e+03

In some cases, however, a log-linear model clearly fits the data better.


In [193]:
start2 = '2002-01-01'
end2 = '2012-06-01'
asset2 = get_pricing('AAPL', fields='price', start_date=start2, end_date=end2)
dates2 = asset2.index

_, ax2 = plt.subplots()
ax2.plot(asset2)
ticks2 = ax2.get_xticks()
ax2.set_xticklabels([dates2[i].date() for i in ticks2[:-1]]) # Label x-axis with dates
loglinreg(np.arange(len(asset2)), asset2)


[2015-06-17 18:13:08.261281] DEBUG: root: Enter SimpleTable.data2rows.
[2015-06-17 18:13:08.262012] DEBUG: root: Exit SimpleTable.data2rows.
[2015-06-17 18:13:08.262557] DEBUG: root: Enter SimpleTable.data2rows.
[2015-06-17 18:13:08.263098] DEBUG: root: Exit SimpleTable.data2rows.
[2015-06-17 18:13:08.264120] DEBUG: root: Enter SimpleTable.data2rows.
[2015-06-17 18:13:08.264835] DEBUG: root: Exit SimpleTable.data2rows.
[2015-06-17 18:13:08.265443] DEBUG: root: Enter SimpleTable.data2rows.
[2015-06-17 18:13:08.265926] DEBUG: root: Exit SimpleTable.data2rows.
[2015-06-17 18:13:08.266430] DEBUG: root: Enter SimpleTable.data2rows.
[2015-06-17 18:13:08.266916] DEBUG: root: Exit SimpleTable.data2rows.
Out[193]:
OLS Regression Results
Dep. Variable: Security(24 [AAPL]) R-squared: 0.935
Model: OLS Adj. R-squared: 0.935
Method: Least Squares F-statistic: 3.783e+04
Date: Wed, 17 Jun 2015 Prob (F-statistic): 0.00
Time: 18:13:08 Log-Likelihood: -876.59
No. Observations: 2624 AIC: 1757.
Df Residuals: 2622 BIC: 1769.
Df Model: 1
Covariance Type: nonrobust
coef std err t P>|t| [95.0% Conf. Int.]
const 0.0634 0.013 4.807 0.000 0.038 0.089
x1 0.0017 8.71e-06 194.500 0.000 0.002 0.002
Omnibus: 1022.810 Durbin-Watson: 0.005
Prob(Omnibus): 0.000 Jarque-Bera (JB): 145.228
Skew: 0.179 Prob(JB): 2.91e-32
Kurtosis: 1.905 Cond. No. 3.03e+03

Summary

From the above we see that trend models can provide a simple representation of a complex data series. However, the errors (deviations from the model) are highly correlated; so, we cannot apply the usual regression statistics to test for correctness, since the regression model assumes serially uncorrelated errors. This also suggests that the correlation can be used to build a finer model.